Comparative transcriptomics identifies candidate genes involved in the evolutionary transition from dehiscent to indehiscent fruits in Lepidium (Brassicaceae)

Background Fruits are the seed-bearing structures of flowering plants and are highly diverse in terms of morphology, texture and maturation. Dehiscent fruits split open upon maturation to discharge their seeds while indehiscent fruits are dispersed as a whole. Indehiscent fruits evolved from dehiscent fruits several times independently in the crucifer family (Brassicaceae). The fruits of Lepidium appelianum, for example, are indehiscent while the fruits of the closely related L. campestre are dehiscent. Here, we investigate the molecular and genetic mechanisms underlying the evolutionary transition from dehiscent to indehiscent fruits using these two Lepidium species as model system. Results We have sequenced the transcriptomes and small RNAs of floral buds, flowers and fruits of L. appelianum and L. campestre and analyzed differentially expressed genes (DEGs) and differently differentially expressed genes (DDEGs). DEGs are genes that show significantly different transcript levels in the same structures (buds, flowers and fruits) in different species, or in different structures in the same species. DDEGs are genes for which the change in expression level between two structures is significantly different in one species than in the other. Comparing the two species, the highest number of DEGs was found in flowers, followed by fruits and floral buds while the highest number of DDEGs was found in fruits versus flowers followed by flowers versus floral buds. Several gene ontology terms related to cell wall synthesis and degradation were overrepresented in different sets of DEGs highlighting the importance of these processes for fruit opening. Furthermore, the fruit valve identity genes FRUITFULL and YABBY3 were among the DEGs identified. Finally, the microRNA miR166 as well as the TCP transcription factors BRANCHED1 (BRC1) and TCP FAMILY TRANSCRIPTION FACTOR 4 (TCP4) were found to be DDEGs. Conclusions Our study reveals differences in gene expression between dehiscent and indehiscent fruits and uncovers miR166, BRC1 and TCP4 as candidate genes for the evolutionary transition from dehiscent to indehiscent fruits in Lepidium. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-022-03631-8.


Background
Flowering plants (angiosperms) form fruits to protect and disperse their seeds. Fruits come in many different types with different morphologies and different properties such as dry or fleshy, and dehiscent or indehiscent [1]. There is a tremendous variation in fruit types both across and within different plant lineages [2]. However, the evolutionary mechanisms that enabled such dramatic shifts to occur, often in a relatively short period of time, remain largely unknown.
The crucifer family (Brassicaceae) includes a number of economically important plants such as cabbage, broccoli, mustard, radish, and turnips. The model plant Arabidopsis thaliana is also a member of this family [3]. Typical fruits of Brassicaceae species are dehiscent, i.e. that the fruits open upon maturation to release the seeds. Dehiscent fruits also likely represents the ancestral fruit type of Brassicaceae [4]. However, indehiscent fruits, i.e. fruits that only release the seed upon decomposition of the fruit, are found in many tribes distributed across the Brassicaceae phylogeny [5]. The scattered distribution of indehiscent fruits indicates that this property evolved independently several times. This situation is mirrored in the genus Lepidium belonging to Brassicaceae: Species of this genus typically produce two-seeded dehiscent fruits, but the genus also includes species with indehiscent fruits [6].
Brassicaceae fruits are composed of two fruit valves that are connected to the replum and enclose the developing seeds. Dehiscent fruits, such as those of A. thaliana and Lepidium campestre (also known as field pepperwort or field cress), form a well-defined dehiscence zone (DZ) at the valve margin [7]. The DZ consists of the lignified layer, a stripe of lignified cells, and a separation layer, a region of small thin-walled cells [8,9]. During fruit ripening, the whole fruit dries and shrinks. Only the lignified structures stay rigid. Thereby a spring-like tension is created within the fruit. At the same time, the middle lamellae of the separation layer cells degenerate to form a pre-determined breaking zone at which the pressure tears the valves apart from the replum. Consequently, the fruit bursts open to release the seeds [9][10][11]. In contrast, the indehiscent fruits of the closely related Lepidium appelianum do not form a DZ. Instead, a continuous ring of lignified cells surrounds the seeds such that the fruit cannot open [7].
Much of the gene regulatory network underlying the proper formation of the fruit valves, replum and DZ has been elucidated in A. thaliana (reviewed in Ballester and Ferrándiz [12]). Establishment of the DZ requires expression of the two redundant MADS-box genes, SHAT-TERPROOF1 (SHP1) and SHATTERPROOF2 (SHP2).
For correct fruit patterning, it is crucial that the expression of the SHP genes is restricted to the DZ. Three genes encoding transcription factors contribute to this process: The MADS box gene FRUITFULL (FUL) which is expressed in the fruit valves [16,17], the BEL1-like homeobox gene REPLUMLESS (RPL) [18], also known as PEN-NYWISE [19], BELLRINGER [20], VAAMANA [21], and BLH9 [22] which is expressed in the replum, and the floral homeotic gene APETALA2 (AP2) which negatively regulates RPL [23].
Transcription factors controlling the expression of these regulators have also been determined. High levels of the C2H2 zinc finger proteins JAGGED (JAG) and the two closely related YABBY1 group proteins FILAMEN-TOUS FLOWER (FIL) and YABBY3 (YAB3) activate the expression of FUL [24]. In contrast, lower levels of JAG/ FIL/YAB3 expression promote expression of SHP genes. The expression of RPL is activated by the KNOTTED1like homeobox protein BREVIPEDICELLUS (BP) [25] whose gene is in turn activated by the C2H2 zinc finger protein NO TRANSMITTING TRACT (NTT) [26]. AP2 is negatively regulated by the microRNA miR172 [27].
In general, proteins encoded by genes expressed in the replum often negatively regulate genes expressed in the valves and vice versa. Apart from the already mentioned interactions, this includes negative regulation of the replum gene BP by the valve proteins encoded by JAG/FIL/YAB3, and negative regulation of JAG/FIL/YAB3 by the replum protein RPL [30].
In a previous study, we have shown that orthologues of the valve margin genes are expressed in a similar way in L. campestre (dehiscent fruits) as in A. thaliana fruits but that expression of the respective orthologues is abolished in the corresponding tissues of indehiscent Lepidium appelianum fruits [7]. As parallel mutations in different genes are unlikely, we concluded that the changes in gene expression patterns are probably caused by changes in upstream regulators such as FUL, RPL or AP2.
To conduct a more unbiased approach to identify the genetic changes that lead from dehiscent to indehiscent fruits than the analysis of candidate genes, we have sequenced the transcriptomes of floral buds, flowers and fruits of both, L. campestre and L. appelianum in the present study. We have identified differentially expressed genes (DEGs) and differently differentially expressed genes (DDEGs) where the latter refers to genes for which the change in expression level between two structures is significantly different in one species than in the other. More DEGs were identified in flowers than in fruits and floral buds and a higher number of DDEGs was found in fruits versus flowers than in flowers versus floral buds. Cell wall synthesis and degradation are important processes for fruit opening as revealed by gene ontology (GO) analysis. The fruit valve identity genes FRUITFULL and YABBY3 were identified as DEGs such that the possible cause for the evolutionary transition from dehiscent to indehiscent fruits in Lepidium may even be an upstream factor of these genes. Possible candidates are BRANCHED1 (BRC1), an ortholog of which may determine whether dehiscent or indehiscent fruits develop on the dimorphic plant Aethionema arabicum, and TCP FAMILY TRANSCRIPTION FACTOR 4 (TCP4) which may regulate YABBY3. These two genes were found to be DDEGs. Our study elucidates differences in gene expression patterns between dehiscent and indehiscent fruits and reveals BRC1 and TCP4 as possible causes for the evolutionary transition from dehiscent to indehiscent fruits in Lepidium.

Overview of the RNA-seq analysis and transcriptome assembly
Sequencing resulted in an average number of reads per library of 56 Mio. for the mRNA and 12 Mio. for the small RNA (Table 1). An initial analysis of the data revealed contamination with sequences from thrips, likely due to infestation of the plants by these insects. Hence, we removed reads matching to the genome of the thrips Frankliniella occidentalis [31] as well as uncorrectable and unpaired reads and reads corresponding to organelle sequences. After this filtering step, 42 Mio. were retained for further analyses for the mRNA sample. For the small RNA sample many reads seem to be derived from organelle RNA. Hence, after removing uncorrectable reads and those matching to the Frankliniella occidentalis genome and organelle sequences, only 1.5 Mio. reads remained on average for the small RNA sample (Table 1).
Assembly using Trinity [32] resulted in a total of 56,413 transcripts for L. campestre and 70,380 transcripts for L. appelianum after removing putative contaminant sequences but including potential splice variants or fragmentary sequences. The assemblies also contained chimeric sequences composed of two different transcripts which were likely a result of mis-assembly [33]. Separation of chimeric sequences increased the number of transcripts to a total of 57,209 for L. campestre and 71,332 for L. appelianum. We used the Benchmarking Universal Single-Copy Orthologs (BUSCO) tool [34] with the dataset eudicotyledons_odb10 as reference to assess completeness of our transcriptomes. The BUSCO analyses revealed that 94.6% of the expected eudicotyledonous "near-universal single-copy orthologs" are present in our assembly of the L. campestre transcriptome while 94.3% of these BUSCOs are present in our L. appelianum transcriptome (Fig. 1). It is common that some genes are fragmented in de novo assemblies. Hence, we analyzed the length distribution of our assemblies. For both species there are two peaks (Fig. 2). One peak appears at a length of about 240 nucleotides (log 2 length of about 7.9) and probably represents fragments. The other peak was found at a length of about 1,450 nucleotides (log 2 length of about 10.5) which indicates that there are also a number of full-length transcripts.
To detect conserved miRNAs, we mapped the small RNA reads onto the mature miRNAs of A. thaliana as provided by miRBase [35]. We found reads for 64 mature miRNAs belonging to 32 miRNA families in the L. campestre small RNA data (Table 2). Using ShortStack [36] and the L. campestre genome as available from NCBI, we identified three novel miRNAs. However, no putative target genes could be identified in the transcriptome of L. campestre using targetfinder (https:// github. com/ carri ngton lab/ Targe tFind er). Our L. appelianum small RNA data contained reads of 60 mature miRNAs belonging to 30 miRNA families (Table 2). No novel miRNAs could be identified for L. appelianum using ShortStack and our transcriptome as reference "genome".
To assess completeness of our small RNA data, we compared our results to the set of conserved and moderately conserved miRNA families as identified by miRNA sample sequencing of vascular plants [37]. For both species, we identified reads for all 16 miRNA families that were found to have originated before the emergence of eudicots and to be conserved across virtually all corresponding species. Furthermore, we found reads for 6 miRNA families in our L. campestre and 7 miRNA families in our L. appelianum small RNA data out of 21 miRNA families which were classified as conserved, although missing in a few corresponding species.

Differential gene expression analysis
To conduct differential expression analysis, we identified putative ortholog pairs between the transcripts of the two Lepidium species. Thereby, we excluded ortholog pairs in which the shorter sequence was less half in length than the longer one and we kept only one transcript isoform per gene as described in the methods section. To make sure not to lose genes of interests using this conservative approach, we checked transcripts that were excluded from the ortholog transcriptome and that showed similarity to A. thaliana genes annotated to have "DNA-binding transcription factor activity" (GO:0003700) (Supplemental Table 1). Most of these transcripts do not seem to be involved in fruit development. The only transcripts which may have some relation to fruit development are BRANCHED2 (BRC2), MYB26, KANADI 2 (KAN2) and MYB85. BRC2 has similar but weaker effects on branching than BRC1 [38] and may have similar effects on dehiscence as will be discussed for BRC1 later. MYB26 has been shown to have a role in anther dehiscence [39] and hence a role for this TF also in fruit dehiscence is conceivable. KAN2 has been shown to repress ASYM-METRIC LEAVES2 (AS2) in A. thaliana [40]. AS2 itself is part of the fruit development network [25]. MYB85 has a role in lignin biosynthesis in A. thaliana [41]. As lignification is important for fruit dehiscence, MYB85 may also have a role in this process. Future transcriptome analyses with a deeper sampling may help to also include these genes in the analyses. However, in general, we do not seem to miss many factors with potential functions in fruit development. Finally, we attained two transcriptome datasets, one for L. campestre and one for L. appelianum, each containing 17,755 transcripts and where each transcript in one species has exactly one putative orthologous transcript in the other species. We will refer to these transcriptome datasets as our ortholog-transcriptomes in the following. We reassessed completeness of our ortholog-transcriptomes and found that 89.2% of the BUSCOs remained in our ortholog-assembly for L. campestre while this value was slightly lower at 89.1% for our L. appelianum ortholog-transcriptome.
Reads were mapped independently to the corresponding ortholog-transcriptome and counted using HTSeq-count [42]. A principal component analysis was conducted based on the normalized number of reads mapping to the ortholog-transcriptomes. As expected, the replicates from the same species and structure clustered together (Fig. 3). The species are separable based on first component which explains 54% of the variance while the structures are separable based on second component which explains 30% variance (Fig. 3).
To learn more about the differences in fruit development between the L. campestre and L. appelianum, we analyzed expression in our ortholog-transcriptomes using the programs DESeq2 [43] and edgeR [44]. We used a multi-factor design to not only be able to identify differentially expressed genes (DEGs) between the species in the same structure and between structures in the same species, but also to identify genes where the change in expression between the structures is different between the two species. We will refer to the genes identified in the latter analyses as differently differentially expressed genes (DDEGs). DESeq2 generally identified more DEGs and DDEGs than edgeR, but there is a great overlap of genes identified by both programs (Fig. 4, Supplemental Fig. 1). Only this overlap between the two methods will be considered in the following. More DEGs were observed between the same structure of the different species as compared to different structures of the same species. In L. campestre, there are similar numbers of DEGs between flower and bud as compared to fruit and flower. In L. appelianum, there are more than twice as many DEGs in flowers versus buds as compared to fruits versus flowers (Fig. 4). When looking at DEGs in the same structure of the different species, the highest number of DEGs is observed in flowers, followed by fruits and buds.
We also analyzed DDEGs in our dataset, i.e. genes which had a significantly different change in expression in flowers versus buds and in fruits versus flowers, respectively, in L. appelianum as compared to L. campestre. These genes may have a significantly stronger upor downregulation in L. appelianum as compared to L. campestre or these genes may be downregulated in one species and upregulated in the other species. We found 70 DDEGs in flowers versus buds and 158 DDEGs in fruits versus flowers when comparing the two species (Fig. 4).
We applied the same methods for the identification of DEGs and DDEGs encoding miRNAs. First, we determined orthologs between the miRNAs based on the A. thaliana miRNAs they mapped to. For 56 mature miR-NAs belonging to 28 miRNA families reads were found in the small RNA data for both species and these mature miRNAs could thus be used for differential expression analyses ( Table 2). We will refer to this dataset as our ortholog-miRNAs. All 16 miRNA families that are conserved across virtually all species according to [37] and 6 out of 21 miRNA families which were classified as moderately conserved belong to our ortholog-miRNAs dataset. Mapping of small RNA reads, counting and differential expression analyses were done as described for the differential expression analysis of the ortholog-transcriptomes.
Only one miRNA was found to be encoded by a DEG or DDEG by both programs DESeq2 and edgeR. The miRNA homologous to miR165a-3p, miR165b, miR166a-3p, miR166b-3p, miR166c, miR166d, miR166e-3p, miR166f and miR166g of Arabidopsis thaliana [45] (they all only differ by one nucleotide), which we will refer to as miR165a-3p, was found to be encoded by a DDEG when comparing fruits and flowers. Targets of miR165a-3p are HD-Zip transcription factors like PHABULOSA, REVO-LUTA and PHAVOLUTA [46]. However, the expression of these target genes does not change much in our transcriptome analyses (Supplemental Fig. 2). Hence, the role of miR165a-3p for fruit dehiscence remains to be clarified.

Gene Ontology and transcription factor analyses
A number of gene ontology (GO) terms [47,48] of the category molecular function are significantly over-or underrepresented in the DEGs and DDEGs (Table 3). Among them, the terms protein binding (GO:0005515) and RNA binding (GO:0003723) were underrepresented in two datasets of DEGs. Interestingly, several GO terms related to cell wall synthesis and degradation, i.e. pectinesterase activity (GO:0030599), cellulose synthase (UDP-forming) activity (GO:0016760), polygalacturonase activity (GO:0004650) and hydrolase activity, hydrolyzing O-glycosyl compounds (GO:0004553) were overrepresented in different sets of DEGs. As we were interested in differences in the gene regulatory network involved in fruit dehiscence in the two species, known to be largely composed of transcription factors in Arabidopsis thaliana (reviewed in Ballester and Ferrándiz [12]), we analyzed genes annotated to have "DNA-binding transcription factor activity" (GO:0003700) in more detail. This set includes transcription factors and transcriptional regulators. For simplicity, we will refer to this dataset as genes encoding transcription factors (TFs).
When comparing flowers and buds, 21 and 28 TFs were DEGs in L. campestre and in L. appelianum, respectively. Among them, there are 13 TFs that were DEGs comparing these structures in both species, including four genes with known functions in flower development, AGAMOUS-LIKE 104 (AGL104) [49], SPOROCYTELESS (SPL, also termed NOZZLE) [50], ORESARA1 (ORE1, also termed ANAC092, ATNAC2, ATNAC6) [51] and ZINC FINGER PROTEIN 2 (ZFP2) [52] (Table 4). Between fruits and flowers, there are 12 TFs in L. campestre and 23 TFs in L. appelianum that are DEGs. Five of these genes are DEGs in fruits versus flowers in both species (Table 4). TFs with differential expression between structures in both species are probably those TFs with common functions for flower and fruit development.

Extension of the gene regulatory network for fruit development
We next investigated how the TFs shown to be DDEGs between fruits and flowers may be involved in the gene regulatory network controlling fruit development (Fig. 5). Therefore, we searched for binding sites of the seven TFs identified by chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiments in the promoters of the genes known to be involved in fruit development. On ChIP-Hub [66], no ChIP-seq data is available for BRC1 and for TRY.
Binding of OBP4 was found in the promoter of all but one of the 18 fruit development genes (Table 7). Binding of RVE6, MYB57, PIF1 and TCP4 was detected in the promoters of 11, 7, 5 and 2 fruit development genes, respectively. PIF1 predominantly binds to the promoters of valve identity genes, with binding to four out of eight valve identity gene promoters and apart from that only binding to one of five valve margin genes. ARF8 is the only fruit development gene for which none of the DDEGs was found to bind to its promoter. To the promoters of YAB3 and FUL, which were found to be

Transcriptomes and small RNA datasets of L. campestre and L. appelianum are nearly complete
We have sequenced the transcriptomes of floral buds, flowers and fruits of L. campestre and L. appelianum.
Benchmarking of Universal Single-Copy Orthologs (BUSCO) analysis revealed that the transcriptome assemblies of the two species contain more than 94% of the eudicotyledonous "near-universal single-copy orthologs". This number is similar to or more than that for transcriptome assemblies of other Brassicaceae [67][68][69].
Furthermore, we found members of all 16 miRNA families that were found to have originated before the emergence of eudicots and conserved in eudicots [37]. These findings reveal that our transcriptome and small RNA data includes most of the expected transcripts and miRNAs.

Differences in gene expression mainly between floral structures
We identified more DEGs when comparing the same structure between the two species than comparing different structures in the same species (Fig. 4). This indicates that gene regulation has diverged between the two species. This is different to what has been observed other flowering plant species, where the correlation of gene expression is higher in the same structure of different species than in different structures of the same species [70]. However, in this case microarray expression data was analyzed which may select for conserved genes. The highest number of DEGs was observed in flowers, followed by fruits and buds, and the highest Table 4 Genes that are differentially expressed in different structures in both species and that are annotated as "DNA-binding transcription factor activity" (GO:0003700). Reg., regulation; L.c., L. campestre; L.a., L. appelianum number of DDEGs was found in fruits versus flowers as compared to flowers versus floral buds. This indicates that the differences in gene expression between L. campestre and L. appelianum are most pronounced between flowers and in the transition from flowers to fruits. This is expected as the developmental program leading to fruit dehiscence or indehiscence needs to be initiated before the fruits are formed. Supportingly, in Aethionema arabicum, a plant that develops dehiscent and indehiscent fruits on the very same individual, differences between the fruit types start to occur early, two days after anthesis [71]. A number of GO terms related to cell wall synthesis and degradation, e.g. pectinesterase activity, cellulose synthase activity and polygalacturonase activity were overrepresented in different sets of DEGs. It has been recognized that secondary cell wall formation at the valve margins [72] and degeneration of cell walls in the separation layer are essential processes for fruit dehiscence after the DZ is correctly specified [73]. Hence, the overrepresentation of GO terms related to cell wall synthesis and degradation is not surprising.

Confirmation of previous expression study
In a previous study, we have compared expression of the valve margin genes as well as the valve gene FUL and the replum gene RPL between L. campestre and L. appelianum by in situ hybridization [7]. We showed that their orthologues from L. campestre (dehiscent fruits) are similarly expressed as in A. thaliana while expression of the respective orthologues is abolished in valve margins of indehiscent L. appelianum fruits. Analysis using qRT-PCR revealed that the valve margin genes IND and SHP1 are expressed at a significantly higher level in flowers and early fruits (the fruit stage for which the transcriptome was sequenced here) of L. campestre than in L. appelianum. Significantly higher expression was confirmed in the present study for SHP1 in flowers (Fig. 5). qRT-PCR analysis revealed significantly higher expression of SHP2 in flowers and of ALC in early fruits in L. appelianum [7]. Expression was not significantly different in the present transcriptome analysis, but expression was also found to be higher for SHP2 in flowers and for ALC in fruits. Like qRT-PCR analysis, our transcriptome analysis also found significantly higher expression of FUL in fruits of L. campestre. Similarly, RPL was found to be expressed at a higher level in the flowers of L. campestre though the difference was only found to be significant in qRT-PCR but not in transcriptome analysis. AP2 was found to be expressed at a lower level in flowers and fruits of L. campestre by both analyses. Again, the difference was significant in qRT-PCR analysis but not in transcriptome analysis. Hence, our transcriptome analysis is in good agreement with the previous qRT-PCR analyses, but in our transcriptome analysis differences were less often significant than in the qRT-PCR analyses.

Known flower and fruit development genes are differentially expressed
To identify differences in the regulation of flower and fruit development between L. campestre and L. appelianum, we focused on differentially expressed or differently differentially expressed genes encoding transcription factors (TFs  differences in the expression of the identified flowering time genes. As mentioned above, the fruit development genes SHP1 and FUL were found to be differentially expressed. FUL is expressed at a significantly lower level in L. appelianum in all three structures. In A. thaliana, the ful knockout mutation causes indehiscence [16,17]. FUL represses the expression of SHP1 and SHP2. Hence, lower expression of FUL in L. appelianum should lead to higher expression of SHP1 and SHP2 in this species. However, only for SHP2 a non-significantly higher expression has been found in flowers of L. appelianum (Fig. 5). SHP1 is unexpectedly expressed at a significantly lower level in L. appelianum flowers. This may be due to the fact that YAB3 [24,30] (Fig. 5) was expressed at a significantly lower level in L. appelianum. YAB3 does not only activate expression of FUL but also that of SHP1 and SHP2 together with JAG and FIL [24]. Hence, a lower level of YAB3 does not only lead to a lower expression of FUL and to less repression of SHP1 and SHP2 by FUL but also to less activation of SHP1 and SHP2 by YAB3. The contrasting effects of YAB3 and FUL on the expression of SHP1 and SHP2 may lead to the observed overall similar expression of the SHP genes in L. campestre and L. appelianum but may still lead to differences in dehiscence due to different expression domains as  Table 7 Number of binding sites of TFs found to be DDEGs to the promoters of known fruit development genes described in [24]. yab3 single mutants do not have any major defects in dehiscence but fil yab3 double mutants are largely indehiscent [24]. Hence, decreased expression of YAB3 in L. appelianum as compared to L. campestre may have been an important factor for the evolutionary shift from dehiscent to indehiscent fruits in L. appelianum. This also shows that there was not only a change in the control of valve margin identity genes but also of the valve identity genes and shifts the causative mutation further upstream in the gene regulatory network of fruit development.

MiR165 is differently expressed in fruits versus flowers
Our smallRNA sequencing revealed that the miRNA homologous to miR165a-3p, miR165b, miR166a-3p, miR166b-3p, miR166c, miR166d, miR166e-3p, miR166f and miR166g [45] is encoded by a DDEG when comparing fruits and flowers. Targets of miR165 and miR166 are the mRNAs of HD-Zip transcription factors like PHABULOSA (PHB), REVOLUTA and PHAVOLUTA [46]. Recently, a function of the miR166-PHB module in anther dehiscence has been elucidated [74]. Upregulation of miR166 in the jba-1D mutant leads to downregulation of its target gene PHB which results in increased expression of SPOROCYTELESS/NOZZLE (SPL/NZZ). jba-1D mutants do not develop a dehiscence zone in anthers, i.e. overexpression of miR166 leads to indehiscence of anthers. Expression of miR166 in fruits is much higher in L. appelianum (indehiscent fruits) than in L. campestre (dehiscent fruits), while the opposite is the case in flowers (Fig. 6). Hence, miR166 may have a role in the development of indehiscent fruits in L. appelianum though the details of the regulation remain to be elucidated.

BRC1 and TCP4 as candidate genes for the evolutionary shift from dehiscent to indehiscent fruits
Our transcriptome analysis also identified seven genes encoding TFs belonging to DDEGs when comparing flowers and fruits (Table 6). PIF1 is a basic helix-loophelix (bHLH) transcription factor that negatively regulates chlorophyll biosynthesis [60]; it is involved in a variety of biological processes such as the repression of light-induced seed germination and chlorophyll accumulation in light [75]. RVE6 is a MYB protein that controls the pace of the circadian clock together with its close homologs RVE4 and RVE8 [63]. The zinc finger protein OBP4 functions in cell cycle progression and cell expansion [65] and is involved in root development [76,77]. So far, involvement of these three factors in flower and fruit development has, to the best of our knowledge, not been reported.
Two other MYB genes, MYB57 and TRY have also been found to be DDEGs (Table 6). MYB57 functions redundantly with MYB21 and MYB24 to regulate stamen development [78]. TRY controls the spacing pattern of trichomes, which are single-celled hairs [64]. Recently it has been found that TRY and other MYB genes of the regulatory network for trichome patterning have been modulated to trigger trichome development in fruits [79]. Hence, these two genes are known to function during flower and fruit development but association with fruit dehiscence is not known so far.
More interestingly, the genes encoding for the two TCP transcription factors BRC1 and TCP4 are DDEGs between fruits and flowers when comparing L. campestre and L. appelianum. Expression of BRC1 correlates with bud inhibition [38,80] but recently, it has been shown that BRC1 is neither necessary nor sufficient for bud inhibition [81]. Noticeably, it has been hypothesized that BRC1 may guide fruit morph determination in the dimorphic Brassicaceae plant Aethionema arabicum [71]. Ae. arabicum produces two fruit morphs on the same plant, one of which is dehiscent and the other one is indehiscent. qRT-PCR analyses showed that the expression of BRC1 in Ae. arabicum is high in flowers and decreases strongly Fig. 6 Expression data plot of the miRNA homologous to miR165a-3p of A. thaliana. miR165a-3p is identical or differs only by one nucleotide to miR165b, miR166a-3p, miR166b-3p, miR166c, miR166d, miR166e-3p, miR166f and miR166g such that they cannot be distinguished and hence are summarized here as miR165-3p. Bars indicate mean normalized count values of reads mapping to miR165-3p in the corresponding structure and species. Dark and light grey bars represent the mean values for L. campestre (Lc) and for L. appelianum (La), respectively. The error bars indicate the standard deviation in fruits of the indehiscent morph but remains at a low level in flowers and fruits of the dehiscent morph. We observe a very similar pattern in our transcriptome analysis for the indehiscent morph in L. appelianum and the dehiscent morph in L. campestre (Fig. 7). In Arabidopsis thaliana buds, BRC1 controls a transcription factor cascade that results in abscisic acid (ABA) accumulation [82]. It has been proposed that this cascade also plays a role in the development of indehiscent fruits in Ae. arabicum [83]. Thus the effect of BRC1 on fruit indehiscence in L. appelianum may be indirect via ABA.
TCP4 has been found to be involved in leaf and flower development as well as in seed oil biosynthesis in A. thaliana [62,84,85]. Furthermore, TCP4 directly activates the expression of miR167 which targets the TFs ARF6 and ARF8 [86]. This regulation has been hypothesized to be important for flower maturation, but may also be involved in fruit dehiscence as ARF6 and ARF8 are part of the gene regulatory network of fruit development [29] (Fig. 5). Another study found physical interaction of TCP4 and AS2 in yeast-two-hybrid experiments [87]. AS2 has also previously been found to be involved in fruit patterning [25] (Fig. 5). Our analysis of ChIP-seq data on ChIP-Hub [66] additionally revealed that TCP4 binds to the promoter of YAB3 (Table 7), which has been found to be differentially expressed between L. campestre and L. appelianum in all structures examined. In flowers, TCP4 is expressed at a higher level in L. appelianum than in L. campestre while the expression pattern is the other way round for YAB3. Hence, it is conceivable that TCP4 represses YAB3 in flowers.

Conclusions
Taken together, our study provides insights into the gene regulatory differences in fruit development between L. campestre producing dehiscent fruits and L. appelianum forming indehiscent fruits. We confirm differences in the expression of the fruit development genes SHP2 and FUL between the two species and reveal the importance of the valve identity gene YAB3 for fruit indehiscence in L. appelianum. We uncover the microRNA miR166 and the TCP transcription factors BRC1 and TCP4 as new candidates for causing the evolutionary transition from dehiscent to indehiscent fruits in L. appelianum.

Plant material, RNA extraction and sequencing
Seeds of Lepidium appelianum (KM 1754) were obtained from J Gaskin, USDA, Fremont County, Wyoming, USA and seeds of L. campestre (KM 96) were aquired from the Botanical Garden of the University of Zürich. All seeds were subsequently mass propagated in the Botanical Garden of Osnabrück University, Germany. Seeds from these mass propagations were sawn on a mixture of seedling substrate (Kammlott, Kammlott GmbH, Erfurt, Germany)/sand/ vermiculite (1-3 mm) (8:1:1) which was supplemented with 1 g L −1 each of Osmocote mini (Scotts Miracle-Gro Company, Marysville, OH, USA) and Triabon (http:// www. compo-expert. com, COMPO Expert GmbH, Münster, Germany). The seeds were placed for 4 days at 4 °C for stratification and then put in the greenhouse under a light-dark cycle of 16 h light, 8 h dark of artificial light, plus daylight. After 5 weeks in the greenhouse, the plants were vernalized for at least 13 weeks at 4 °C with a light-dark cycle of 8 h light, 16 h dark. After vernalization, the plants were put back in the greenhouse. Plant material was harvested from two batches of independently grown plants 3 to 5 weeks after the end of vernalization. Plant material was harvested between 12 and 4 pm to minimize the effect of circadian rhythm. Late flower buds, flowers and early fruits were harvested and immediately frozen in liquid nitrogen. Three samples were taken for each species and each structure resulting in 18 samples in total. For each sample, about 100 mg plant material was pooled from four individual plants. The material was pulverized in liquid nitrogen using a mortar and pestle. RNA was extracted using Qiazol (Qiagen) according to the manufacturer's instructions. RNA quantity and quality was checked by gel electrophoresis. The samples were sent to the Vienna BioCenter Facility for Next Generation Sequencing where they were quality checked and sequenced on a HiSeqV4. For mRNA sequencing, 125 bp paired-end reads were produced and 50 bp single-end reads were generated for small RNA sequencing.

De novo assembly
To simplify de novo assembly, also duplicate reads, i.e. reads with the exact same length and sequence, were removed. The remaining reads were assembled using Trinity [32] with default settings for the two species L. campestre and L. appelianum separately. To identify remaining contamination in the transcriptome, a BLASTn search was conducted against the nucleotide sequence database of NCBI (nt) using the transcripts in the assembly as query with the option -max_target_ seqs 5. Transcripts for which the best BLASTn result came from a non-plant species and had an eValue of E < 10 -10 were removed from the transcriptomes. The completeness of the assembled transcriptomes was evaluated using the Benchmarking Universal Single-Copy Orthologs tool BUSCO [34] and their accompanying dataset for eudicotyledons with 2121 groups (odb10).

Separation of chimeras in the assemblies
The initial assemblies contained chimeras composed of two different transcripts. As these chimeras often result from misassembly [33], we sought to separate chimeras into their separate transcripts. To recognize chimeras, we first conducted a BLASTn search [91] using the transcripts from the Lepidium transcriptomes as query and the cDNA sequences of the representative A. thaliana gene model as provided by TAIR10 (TAIR10_ cdna_20110103_representative_gene_model_updated. fasta) as database saving the best two subjects (i.e. A. thaliana cDNAs) for each query (i.e. for each transcript from the Lepidium transcriptomes) (Fig. 8). Using a customized perl-script (Supplementary Data S1) we searched for instances where the two subjects fitted to different regions of the query (i.e. one part of the Lepidium transcript has a BLAST hit corresponding to one A. thaliana cDNA while another part of the same transcript has a BLAST hit corresponding to another A. thaliana cDNA). These instances likely indicate chimeric Lepidium transcripts. To identify the best position to split the chimeras, we considered at which position and to what extend the A. thaliana cDNAs matched to the Lepidium transcripts as shown in Fig. 8. Chimeras were split if the overlap was less than 150 nucleotides either in the middle of the overlap or at the positions corresponding to the corrected end and beginning of the involved transcripts (Fig. 8).

Identification of miRNAs in smallRNA libraries
SmallRNA reads were mapped to mature miRNAs from Arabidopsis thaliana as downloaded from miRBase [35] employing bowtie2 with the settings -N 1, -L 18. If a read mapped to a specific miRNA from A. thaliana this miRNA was considered to be present in the corresponding Lepidium species. Mature miRNAs only differing by one nucleotide were combined to avoid multiple mapping during read-counting. To identify novel miRNAs in Lepidium, we used ShortStack [92] with the parameters --foldsize 500, --dicermin 18 and the trinity transcriptome assembly of the corresponding Lepidium species as reference "genome". For L. campestre, ShortStack was run a second time, this time using the genome sequence of L. campestre as available at the National Centre for Biotechnology Information [93] as reference. The stem-loop sequences classified as "N15" or "Y" by ShortStack were used as query sequences for BLAST searches of pre-miRNAs of A. thaliana as downloaded from miRBase [35] to distinguish known from novel miRNAs. The stem-loop sequences classified as "Y" by ShortStack without similarity to pre-miRNAs of A. thaliana were classified as novel Lepidium miRNAs.

Determination of orthologs
For transcriptome data, putative ortholog pairs were determined using a reciprocal best hit approach as follows. BLASTn searches were conducted using the transcriptome assembly with chimeras separated of L. campestre as query and the transcriptome assembly with chimeras separated of L. appelianum as subject and vice versa. For each transcript of L. campestre the best BLAST hits (i.e. all hits having the same eValue, score and alignment length) in L. appelianum were recorded and vice versa. If a transcript T c from L. campestre had the transcript T a from L. appelianum among its best BLAST hits and transcript T a from L. appelianum had transcript T c from L. campestre in its list of best BLAST hits, these were considered as best reciprocal BLAST hit. Best reciprocal BLAST hits with an alignment length of more than 250 nucleotides and where the length of the shorter sequence was at least 50% of that of the longer sequence were considered as putative ortholog pairs. Additionally, another BLASTn search was conducted using the transcripts in the transcriptomes as query and the Arabidopsis thaliana TAIR10 cDNA dataset as database. The set of putative orthologous transcript pairs was pruned such that only one transcript isoform was kept for each species unless different isoforms fitted to different A. thaliana genes. The isoform with the longest alignment length between the two species was chosen to be kept. This way, for each transcript in the one transcriptome exactly one transcript in the other transcriptome was kept. We refer to this dataset as the ortholog-transcriptome. The transcripts in the ortholog-transcriptome dataset were named using the TAIR10 identifier of the best BLAST result or numbered if no BLAST result was obtained this way.
For the miRNA data, orthologous miRNAs of the two Lepidium species were defined as those miRNAs fitting to the same miRNA from A. thaliana. Comparison of the novel Lepidium miRNAs revealed that none of these was found in both species.

Read mapping and feature counting
Preprocessed transcriptome and small RNA reads were mapped against ortholog-transcriptome and mature miRNAs, respectively, using bowtie2 (settings: Fig. 8 Schematic presentation of the detection and separation of chimeric transcripts in the Lepidium transcriptomes. The procedure is slightly different depending on whether the positions of the best two BLAST hits in A. thaliana cDNAs overlap on the Lepidium transcript or not. If the positions of the best two BLAST hits overlap by less than 150 nucleotides, the Lepidium transcript is split in the middle of the overlap. Otherwise, the beginning and end of the involved transcripts was determined based on the total length of the fitting A. thaliana cDNAs --very-sensitive-local --phred33 and settings: --phred33 -N 1 -L 18, respectively) [90]. A custom GFF was generated with one feature for each transcript and miRNA. Mapped reads per feature were then counted using HTSeq-count [42] with the settings -s no -t transcript -m union.

Differential gene expression analysis pipeline
Differentially expressed genes were identified using R (https:// www.r-proje ct. org/) and the Bioconductor packages edgeR [44] and DESeq2 [43]. Transcript counts were normalized with respect to transcript length. Lowly expressed transcripts with normalized counts and lowly expressed miRNAs with raw counts of less than 19 were discarded. Considering the two species L. campestre and L appelianum and the structures bud, flower and fruit, the following multi-factor design was used: species + structure + species:structure. A Likelihood Ratio Test (LRT) and a quasi-likelihood F-test were conducted in DESeq2 (command: DESeq(object, test="LRT", reduced=~species + structure)) and EdgeR (command: glmQLFit(object, design)), respectively to identify differentially expressed and differently differentially expressed genes. Only transcripts and miRNAs having a log-fold change to the base of 2 of more than 1 were considered. For DESeq2 the false discovery rate threshold α was set to 0.001.
For principal component analysis, count data was normalized using regularized logarithm with the option blind=FALSE in DESeq2 and the principal components were plotted using the plotPCA function in R.

GO enrichment analysis
Gene Ontology (GO) enrichment analysis was conducted on the GO website (http:// geneo ntolo gy. org/) using the PANTHER Overrepresentation Test [94]. The TAIR10 identifiers of the transcripts in the ortholog-transcriptome were provided as reference list. The TAIR10 identifiers of the transcripts which were identified as significantly differently expressed genes by both programs, DESeq2 and EdgeR, were provided as analyzed list. Arabidopsis thaliana was chosen as organism and "GO Molecular function complete" was selected as annotation data set. Enriched GO categories were determined using the Fisher's Exact Test with False Discovery Rate correction.
GO categories and terms were also determined using the AnnotationDbi in R. Transcripts associated with the term "DNA-binding transcription factor activity" were analyzed further.

Promoter analyses
Binding of transcription factors to the promoters of genes involved in fruit opening was analysed using ChIP-Hub (http:// www. chip-hub. org/). ChIP-Hub provides access to data on binding sites determined using chromatin immunoprecipitation followed by sequencing (ChIP-seq). On the ChIP-Hub website, A. thaliana was chosen as species and binding data was visualized on the WashU Epi-Genome Browser. For each fruit development gene, 1,500 nucleotides upstream of the translation start codon were investigated and each occurance of binding of one of the transcription factors found to be DDEGs was noted.

Statement
The study complies with relevant institutional, national, and international guidelines and legislation for plant ethics in the methods section.